Analysis of Software Project Reports for Defect Prediction Using KNN
نویسندگان
چکیده
Defect severity assessment is highly essential for the software practitioners so that they can focus their attention and resources on the defects having a higher priority than the other defects. This would directly impact resource allocation and planning of subsequent defect fixing activities. In this paper, we intend to predict a model which will be used to assign a severity level to each of the defect found during testing. The model is based on text mining and machine learning technique. We have used KNN machine learning method to predict the model employed on an open source NASA dataset available in the PITS database. Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis is used as the performance measure to validate and analyze the results. The obtained results show that the performance of KNN technique is exceptionally well in predicting the defects corresponding to top 100 words for all the severity levels. Its performance is less for top 5 words, better for top 25 words and still better for top 50 words. Hence, with these results, it is reasonable to claim that the performance of KNN is dependent on the number of words selected as independent features. As the number of words increases, the performance of KNN also gets better. Apart from this, it has been noted that KNN method works best for medium severity defects as compared to the other severity defects.
منابع مشابه
Fault Prediction using Hybrid Fuzzy C-Means with Genetic Algorithm and KNN Classifier
Software quality and reliability have become the main concern during the software development. It is very difficult to develop software without any fault. The fault-proneness of a software module is the probability that the module contains faults and a software fault is a defect that causes software failures in an executable project. Early detection of fault prone software components enables ve...
متن کاملHeterogeneous Defect Prediction via Exploiting Correlation Subspace
Software defect prediction generally builds models from intra-project data. Lack of training data at the early stage of software testing limits the efficiency of prediction in practice. Thereby researchers proposed cross-project defect prediction using the data from other projects. Most previous efforts assumed the cross-project defect data have the same metrics set which means the metrics used...
متن کاملSoftware Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms
A successful software should be finalized with determined and predetermined cost and time. Software is a production which its approximate cost is expert workforce and professionals. The most important and approximate software cost estimation (SCE) is related to the trained workforce. Creative nature of software projects and its abstract nature make extremely cost and time of projects difficult ...
متن کاملDefect Prediction and Analysis Using ODC Approach in a Web Application
In software project management, there are five basic factors to predict and control, those are size, process, effort, environment and quality. Most of the software engineers focus on these factors to improve the software quality. In practice quality management implies finding defects and rectifying them. Software defects are not well enough understood to provide a clear methodology for avoiding...
متن کاملA Framework for Defect Prediction in Specific Software Project Contexts
Software defect prediction has drawn the attention of many researchers in empirical software engineering and software maintenance due to its importance in providing quality estimates and to identify the needs for improvement from project management perspective. However, most defect prediction studies seem valid primarily in a particular context and little concern is given on how to find out whi...
متن کامل